A Model for High-coverage Lexical Semantic Annotation Generation

نویسندگان

  • Attila Novák
  • Borbála Siklósi
چکیده

AI applications often receive their input in the form of natural language text, or as the transcription of spoken text. A commonsense inference system should transform such input to a formal representation with limited vocabulary in order to be able to process them. In this paper, we present a method based on neural word embeddings that automatically assigns semic features to words of natural language. These features either describe the ontological category of a given word or provide some characterization or additional information. We show that our method has high coverage and performs well for English and Hungarian, and can easily be extended to other languages as well.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lexical Coverage Evaluation of Large-scale Multilingual Semantic Lexicons for Twelve Languages

The last two decades have seen the development of various semantic lexical resources such as WordNet (Miller, 1995) and the USAS semantic lexicon (Rayson et al., 2004), which have played an important role in the areas of natural language processing and corpus-based studies. Recently, increasing efforts have been devoted to extending the semantic frameworks of existing lexical knowledge resource...

متن کامل

Coarse Lexical Semantic Annotation with Supersenses: An Arabic Case Study

“Lightweight” semantic annotation of text calls for a simple representation, ideally without requiring a semantic lexicon to achieve good coverage in the language and domain. In this paper, we repurpose WordNet’s supersense tags for annotation, developing specific guidelines for nominal expressions and applying them to Arabic Wikipedia articles in four topical domains. The resulting corpus has ...

متن کامل

Evaluating Lexical Resources for a Semantic Tagger

Semantic lexical resources play an important part in both linguistic study and natural language engineering. In Lancaster, a large semantic lexical resource has been built over the past 14 years, which provides a knowledge base for the USAS semantic tagger. Capturing semantic lexicological theory and empirical lexical usage information extracted from corpora, the Lancaster semantic lexicon prov...

متن کامل

Development of the Multilingual Semantic Annotation System

This paper reports on our research to generate multilingual semantic lexical resources and develop multilingual semantic annotation software, which assigns each word in running text to a semantic category based on a lexical semantic classification scheme. Such tools have an important role in developing intelligent multilingual NLP, text mining and ICT systems. In this work, we aim to extend an ...

متن کامل

Data-Driven Learning in an Incremental Grammar Framework

Overview Incremental processing of both syntax and semantics, both in parsing and generation, is of significant interest for modelling the human language capability, and for building systems which interact with it. Formal linguistics has made significant contributions to this; one example is the framework Dynamic Syntax, which provides an inherently word-by-word incremental grammatical framewor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017